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Abstract 

Given a Reproducing Kernel Hilbcrt Space ("H, (., .)) of real-valued functions 
and a suitable measure ft over the source space, we decompose H as sum of a 
subspace of centered functions for ft and its orthogonal in H. This decomposition 
leads to a special case of ANOVA kernels, for which the functional ANOVA 
representation of the minimal norm interpolator can be elegantly derived. The 
proposed kernels appear to be particularly convenient for analyzing the effect of 
each (group of) variable(s) and computing sensitivity indices without recursivity. 
Keywords: Kernel Methods, Global Sensitivity Analysis, Sobol-Hoeffding 
Decomposition, Gaussian Process Regression, Computer Experiments 

1. Introduction 

Let / be a real- valued function defined over D C R . We assume that / is 
costly to evaluate and that we want to study some global properties of / such as 
the influence of each variable on /. As the number of evaluations of / is limited, 
it may be unaffordable to run sensitivity analysis methods directly on /. Thus, 
it can be helpful to replace / by a mathematical approximation for performing 
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such studies Q| • We propose in this article a class of functional approximations 
that is well suited for performing global sensitivity analysis. First of all, we 
present some background in sensitivity analysis, interpolation in RKHS, and 
a class of kernels from the state of the art called ANOVA kernels. We then 
construct RKHS of zero mean functions and derive a new class of ANOVA 
kernels that is well suited for sensitivity analysis. Finally, we illustrate the use 
of those kernels on a classical example from the sensitivity analysis literature. 

1.1. Sensitivity analysis 

Let us consider / g L 2 (D,n), where D = D% x ••• X Dd is a product space 
of bounded sets fljCl and ji = (A\ X • • • X fid is a product probability measure 
over D. The purpose of global sensitivity analysis is to analyze the influence of 
all (groups of) variables on /. A common approach is to study the variance of 
/(X) where X is a random vector with distribution /z. 

If d = 1, any g £ L 2 (D,n) can be canonically decomposed as a sum of a 
constant plus a zero mean function, 



.9 = J d g(s)dfi(s) + \g - g(s)dfi(s] 
so that we have a geometric decomposition of L 2 (D, /n): 

L 2 (D,h)=LI(D,li)®L 2 q (D,h) (1) 



where L 2 (D,/i) denote the subspace of constant functions and Lq(D,h) the 
subspace of zero mean functions for \x. 

Similarly, if d > 1, the space L 2 (D, /i) has a tensor product structure (| 

d 

L 2 (AM)=0i 2 (A,M t ). (2) 

i=l 

Using Eq. [Hand the notation L 2 P {D, /i) = (g)f =1 L 2 p .{D h pa) for P 6 {0, l} d we 
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obtain 

d 



L 2 (£>,M)=0(i?(A,Mi)e^(A,Mi)) = L 2 P (D, (i). (3) 

Pe{o,i} d 



A key property is that the subspaces Lp and Lq are orthogonal whenever 
P Q. Given an arbitrary function / G L 2 (D,fi), the orthogonal projections 
of / onto those subspaces leads to the functional ANOVA representation 
(or Sobol-Hoeffding decomposition) of / into main effects and interactions: 

d 

/(x) = /o + fifa) + E fijfa'Bi) + ■■■ + /i,...,«*( x )- ( 4 ) 

?— 1 i<j 

Let us remark that fo can be seen as a constant function over D (i.e. an element 
of L 2 y n ), and each fi lt ... t i h (1 < fc < d, ij, . . . , G [1, d]) can be represented as 
an element of L 2 p ^{D, (j,) (7 denotes here . . . , ik}) by identifying it with 

/p (J ) :xefl — ► fi 1 ,...,i k {xi 1 ,. ■ .,x ih ) G K (5) 



where P(7) G {0, l} d with P(7) 4 = 1 if i G 7 and P(7)j = if i £ I. So 
the integral of fp(i) with respect to any of the variables indexed by i G I is 
zero. This representation of / gives an insight on the influence of each variable 
or couple of variables on /. For the constant term, the main effects, and the 
two-factor interactions, one gets the classical expressions 



fo = / /(x)d/i(x) 

J D 

fi(xi)= /(x)djti_i(x_j) - fo (6) 

Jd -w 

fi,j(xi, Xj) = / /(x)d^_ { . M} (x_ {lj} ) - fi(xi) - fj(xj) - fo 



where D-i := Yii&i D{ and := ®i£iPi- Similarly, the calculation of any 
// requires to have recursively calculated all the /j's for J G 7, which makes it 
cumbersome (if not practically impossible) to get higher order interactions. 
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Coming back to the case of a random vector X with distribution fi, the 
variance of the random variable /(X) can be decomposed as 
d 



var(/(X)) = X)var(/i(*0) + var (-M X ^)) + ' ' ' + var(/i,..., d (X)) (7) 



and the global sensitivity index 5/ for a subset of indices / is usually defined as 



Si represents the proportion of variance of /(X) explained by the interaction 
between the variables contained in /. The knowledge of the indices Si is very 
helpful for understanding the influence of the inputs, but the computation of 
the /j's is cumbersome when the evaluation of / is costly since they rely on the 
computation of the integrals of Eq[6] Following 7], it can then be advantageous 
to perform the sensitivity analysis on a surrogate model m approximating /. 

1.2. Optimal interpolation in RKHS 

The class of functional approximation techniques considered in this work, com- 
monly referred to as Kriging or Gaussian Process Regression in contemporary 
statistical learning settings, boils down to optimal interpolation in Reproducing 
Kernel Hilbert Spaces (RKHS). / is here assumed to be known at a set of points 
X = {X\, . . . , X n } € D. Given TL a RKHS of real- valued functions over D with 
kernel K (., .), the interpolator m of / at X that minimizes is [jj: 



where F = f(X) is the column vector of observations, k(.) is the column vector 
of functions (K (Xi, -))i<i< n an< ^ ^ ^ s * ne Gram matrix (K)ij = K(Xi, Xj). 

A striking fact about Eq|H]is that m can be an interpolator even if / ^ H. K 
can be any symmetric positive definite kernel and it has to be chosen in practice. 
This choice has a great impact on the resulting model, and it is customary to 



i<3 



Si = var(/ J (X / ))/var(/(X)). 



(8) 



m(x) = kixfK^F 



(9) 
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select K among family of parametric functions of positive type according to some 
prior knowledge about /, and to estimate the corresponding parameters based 
on observed data. We will focus here on a particular family of such kernels, 
called ANOVA, which are furthermore designed to offer good interpretability 
properties. The main contribution of this paper (in Section 3) deals with a 
special case of ANOVA kernels tailored for an improved disentanglement of 
multivariate effects. 

1.3. ANOVA kernels and a candidate ANOVA-like decomposition of m 

ANOVA kernels (See e.g. Q section 5.4 for a historic approach) have been 
proposed in the literature of multivariate regression for an enhanced inter- 
pretability of splines and related models. They are constructed [5] by taking 
tensor products of univariate kernels 1 + /c l , where 1 stands for a bias term and 
the k l, s are arbitrary symmetric definite positive kernels on Dj x Di (1 < i < d): 

d 

K ANO VA(x,y)=l[(l + k t (x l ,y l )) = l+ ]T Y{k\x llVl ). (10) 
i=i i 1 1 ..<;.< / 

Denoting by V and W the RKHS of functions defined over Di with respective 

reproducing kernels 1 and k l , Kan ova is in fact the reproducing kernel of the 

space Hanova = ®i = i(l l +%')- Now, back to Eq. [101 the particular structure 

of K anova allows us to develop the n x 1 vector k(x) of eq[9]as follows: 

k (-) = i+ E O k *(-) ( n ) 

IC{l,—,d,} iei 

where denotes a term-wise product. Injecting this relation in eq[9l we get: 

m(x) = l'K- 1 F+ E (O kl (^)) K^F 

ic{i,...,d} Vie/ / 



:{i,...,d} 

1*K 1 F+ YK^ix.YK-^) 

I<Z{l,...,d} iei 



(12) 



Noting m = l^K^F and m 7 (x) = ]J ieI k'fo^K^F (J C {l,...,d} and 
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I ^ 0), we obtain a development of m which looks quite similar to its FANOVA 
representation: 

d 

m(x) = to + y j m i {x i ) + } j rn ij j(xi ! j) H h mi,...,d(xi ! ... !(J ), (13) 

i— 1 i<^' 

where the m/'s have the nice feature of not requiring any recursive computation 
of integrals. However, the properties of the ANOVA representation are not 
respected since the to/'s are not necessarily zero mean functions, i.e. any two 
terms of the decomposition do not have to be orthogonal in L2- For example, 
if k l is an Ornstein-Uhlenbeck kernel [lj, it is known that V C H l . 

Let us remark that the submodels too and to,/ respectively belongs to the 
spaces I 1 ® • ■ • ® t d and ® ie/ H l 1 J , but that they are not necessarily 
orthogonal projection onto those spaces. In order to ensure that the decom- 
position of Eq. has the properties required in Eq. 2J we have to consider 
RKHS H l that are i^-orthogonal to the constant functions l l , ie RKHS of zero 
mean functions for [ii With such a construction, we would benefit from 

the advantages of the two decompositions: the meaning of the decomposition 
given by Eq. U for the analysis of variance and the easiness of computation of 
the to/'s from Eq. Q2] 

2. RKHS of zero mean functions 

We will show in this section how to extract a RKHS of zero mean functions 
from a RKHS with arbitrary symmetric definite positive kernel K (k if d = 1). 

2.1. Decomposition of one- dimensional RKHS 

Let % be a RKHS of functions defined over a compact set Del and ^ a 
finite Borel measure over D. Furthermore, we consider the couple of hypotheses: 



6 



(i) k : DxD^-Ris/i® /i-measurable. 
^ (ii) I \J k(s, s)dfi(s) < oo. 

As D is compact, any bounded kernel satisfies the condition (ii) so this 

hypothesis is not very restrictive. For example, usual stationary kernels such as 

the Gaussian, power-exponential and Matern kernels satisfy it. 

Proposition 1. Under H can be decomposed as a sum of two orthogonal 

sub-RKHS. H = Ho © T~L\ where Ho is a RKHS of zero-mean functions for fi, 
and its orthogonal Hi is at most 1- dimensional. 



Proof. Following (i), the integral operator I : % — ¥ E,, h i-> / h(s)dfj,(s) 

Jd 



is 



bounded since for h £ T-L 



\I{h)\< / \{h,k{^-))u\Ms)<\\h\\ H / VkMMs) (14) 

JD JD 

According to the Riesz representation theorem, there exists a unique R E H 
such that V7i € /(/i) = (h,R) H . If i?(.) = 0, then all / € % are centered 
functions for fx, so that Ho — H and Hi = {0}. If R(.) ^ 0, then "Hi = span(R) 
is a 1-dimensional sub-RKHS of H, and the subspace Ho of centered functions 
for /i can be defined by Ho = H± . □ 

Remark 1. For all x G D the value of R(x) can be calculated explicitly. In- 
deed, recalling that k(x, .) and R are respectively the representers in H of the 
evaluation functional at x and of the integral operator, we get: 

R(x) = (k(x,.),R) H = I(k(x,.)) = I k(x,s)dfi(s). (15) 

JD 

The reproducing kernels ko, k\ of Ho and Hi satisfy k = ko + k\. Let 7r 
denote the orthogonal projection onto Hi- Following [2j we obtain 



ko(x, y) = k(x, y) - n(k(x, .))(y) 
= k(x,y) 



k(x, s)dfi(s) / k(y,s)dfi(s) (16) 

D JD 



k(s,t)dn(s)dfj,(t) 

DxD 



2.2. Example 

Let us briefly illustrate the previous results for two usual kernels: 

b{x,y) =mm(x,y) and g(x, y) = exp {-(x - y) 2 ) , (17) 
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known as the Brownian and the Gaussian covariance kernels, respectively. Here 
D = [0, 5] and \i is the Lebesgue measure over D. Figure 1 represents sections 
of the reproducing kernels ki(x,y) and gi(x,y) (i E {0,1}) outcomes of the 
decomposition of b and g, for various values of y. 



(a) 6i(,0) 



(b) 6i(.,2) 



(c) (-,4) 





(d)fci(.,0) (e)fei(.,2) (f)*i(-,4) 

Figure 1: Representation of the sub kernels bi(.,y) and gi(.,y) for y = 0,2,4 and i=0,l. The 
dashed lines correspond to b\, g\ and the solid lines are for bo and go- 

We observe on this figure that bo(.,y) and go(-,y) take negative values and 
that they are zero mean functions (as elements of Ho). Moreover, bo(.,y) and 
bi(.,y) (respectively fc (.,y), ki(.,y)) arc orthogonal for the scalar product of 
their RKHS but are not orthogonal for L 2 (D,[i). 

2.3. Generalization for multi- dimensional RKHS 

The former decomposition of one-dimensional kernels leads directly to the 
decomposition of tensor product kernels 

d d 

X(x,y) = Y\k\ Xl , Vl ) = n(*S(*i.Wi) + W))- (18) 

i=l i=l 
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if the fc"s satisfy HQ] Since k^Xi^yi) is a 1-dimensional kernel, it can be seen 
as a bias term so this equation highlights the similarity between the usual ten- 
sor product kernels (power-exponential, Brownian, Matern, . . . ) and ANOVA 
kernels. 

3. A new class of kernels for sensitivity analysis 

We now propose a special class of ANOVA kernels, 
d 

K* ANOV A(x,y) = n(i + K(xi, yi )) = 1+ Il fc o( Xi '^)' ( 19 ) 

i=l IC{l,...,d}ieI 

where the /cq are obtained by decomposing kernels as in the previous section. 
Proposition 2. If m is a best predictor based on KanovA' 

mi =Hki(x i ) t K- 1 F (20) 
iei 

is the term of the functional ANOVA representation of m indexed by I. Hence, 
the decomposition of m given by Eq. \1S\ coincides with its functional ANOVA 
representation (Eq. [^). 

Proof. The kernels k l are associated to RKHS H l Q of zero-mean functions, so 
we have t l _I_l 2 W . The underlying RKHS associated to K ANOVA is 

n* ANOV A = II ( r ® n o) ( 21 ) 

i=l ^ ' 

where _L stands for the L 2 scalar product. The result follows. □ 

Corollary 1. Contrarily to usual ANOVA kernels, the class of K\nova en ~ 
sures that the terms mi are mutually orthogonal in the L 2 sense. 

As the expression of the submodels is simple, the computation of the sensi- 
tivity indices can be performed analytically and efficiently. 
Corollary 2. The sensitivity indices Si of m are given by: 

Si = var^X,)) = F'K-^Q^rQK-iF 

var(m(X)) F t K -i (©^(lnxn + A) - l„ Xn ) K^F 

where Ti is the n x n matrix Ti = J D k^ l (xi)]s^ l (xi) dfii(xi) and l nxn is the 
n x n matrix of 1. 
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Proof. The numerator is obtained by direct calculation: 
vartmxCXj)) = var ^Jk^a^K" 1 ^ 

= F*K- 1 ^ l4(a: i )l4(a: i ) t d W (a: i )^ K X F. 



(23) 



For the denominator, we obtain similarly 



var(m(X)) = F'KT 1 Q (^j (l„ x i+kj(a^)) (l„xi + Kixi))* cMa*)) K ' F 



- F t K _1 l„ xn K^ 1 F. 



(24) 



We then use the property that k^ix, .) is a zero mean function so we have 
/ D . (l«xi + k^^)) (lnxl + K( X i)) d^i(xi) = l nxn + T t . □ 

Conversely to the method developed in ^] , the computation of Si does not 
require here to compute all Sj for J C I. 

3.1. example: the g-function of Sobol 

In order to illustrate the use of the kernels Kanova we consider the so-called 
g-function of Sobol, defined over [0, l] d by 

g(x 1 ,...,x d )=T[ l 4g *- 2 l+°* witha fc >0. (25) 

fe =i 1 + afc 



and one particular advantage is 



This function is well known in the literature 
that the Sobol sensitivity indices associated to the variables Xi, i = 1, . . . , d can 
be obtained analytically: 

Si = TOF (26) 

ECU (i + 3tt^f) - 1 

Here we limit ourself to the case d — 2 and we choose a\ = 1, 02 = 2. Starting 
from a one-dimensional Matern 3/2 kernel 

fc(s, y) = (1 + 2 \x - y\) cxp(-2 |x - (27) 
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we can derive the expression of K \ NO va using Eq. [TH] and Q1JJ 

-l 



Kanova( x ^) = 1 1 



1 + k(x l ,y i ) 



fc(xi, s)d/x(s) / k(yi,s)d^(s) 







V 



k(s, i)d/i(s)d/i(t) 



(28) 

We then build the optimal interpolator m £ ^anova based on the ob- 
servation of g at 20 points of [0, l] 2 (those points steem from a LHS-maximin 
procedure). According to what we have seen, the function m can be split as a 
sum of 4 submodels mo, mi, m-2 and m.12 which are represented on Fig. 13.11 






(a) 9 



(b) m 



(c) m 






(d) mi (e) m,2 (f) mi2 

Figure 2: Representation of the g-function, the model m and all the submodels on [0, l] 2 . The 
z scale is the same on all graphs. 

We observe numerically that the mean value of mi, 7712 and m.12 is lower than 
le — 15 (in absolute value), corroborating that these functions are zero-mean. 
More generally, after numerical computations of the scalar products between 
any two functions of the set {mo,mi,ra2,mi2}, we observe that \(mi,mj) L2 \ < 
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le - 15 for I ± J. 



Using Eq.[22J the sensitivity indices calculated for m are S\ = 0.69, S2 — 0.30 
and 5*i2 = 0.02 (the sum is slighly different from 1 because of rounding up). 
Those figures can be compared to the analytical values given by Eq. [55] which 
are S\ = 0.675, S2 = 0.30 and S12 = 0.025. The accuracy of the computation 
of Global sensitivity indices can be judged satisfactory in this example. 

4. Concluding remark 

We proposed a new class of kernels for which the functional ANOVA de- 
composition of the mean predictor can be obtained analytically, without the 
usual recursive integral calculations for higher order interaction terms. This 
new class is a special case of usual ANOVA kernels, with particular univariate 
kernels so that an orthogonality to constants is respected. Up to a calculation 
or a tabulation of the integral of univariate kernels, the replacement of usual 
ANOVA kernels by the ones proposed here may be done at neglectable cost in 
applications, with substantial benefits for the model interprctability and global 
sensitivity analysis studies. 

The issue of the estimation of the parameters of K\ NOVA has not been raised 
yet in this article. This is however an important point for the practical use of 
those kernels. The use of the likelihood theory has been considered, but many 
points such as the links between the optimal parameters for K and the optimal 
parameters for the associated Kjlnqva needs to be studied in detail. Finally, 
since the pattern of the proof of Prop. [Jean be applied to any bounded operator 
on %, the perspectives for future research include a focus on other operators than 
the integral operator /, for example for building RKHS respecting orthogonality 
to a family of trend basis functions. 
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